Skip to content

[AHM] Prepare election-provider-multi-block for full lazy data deletion#8304

Merged
kianenigma merged 4 commits intomasterfrom
kiz-empb-lazy-deletion-1
Apr 25, 2025
Merged

[AHM] Prepare election-provider-multi-block for full lazy data deletion#8304
kianenigma merged 4 commits intomasterfrom
kiz-empb-lazy-deletion-1

Conversation

@kianenigma
Copy link
Copy Markdown
Contributor

@kianenigma kianenigma commented Apr 23, 2025

Follow-up to: #8127

This PR makes all the hefty storage items in multi-block and multi-block::verifier to be keyed by a round index (note that the same has already been done for multi-block::signed).

This allows us to stop deleting all the data upon a round ending, and instead simply know that it is stale, and just delete it later via #[pallet::task], or a free extrinsic.

As per #8127, the worst PoV weight of this pallet with the proposed configurations is around 4Mb uncompressed. With 10Mb PoV, this will likely be okay for now. Therefore, this PR is adding the infrastructure for lazy deletion, yet it is not activating it.

I am raising this in advance because

  1. It will prevent a data migration in the past. Migrating a Map to DoubleMap and so on in transit is a pain
  2. The polkadot-staking-miner better adapt itself already to read these storage items in the new format

A follow-up may replace OnRoundRotation with (), and add the transactions/#[pallet::task] needed to do the deletion lazily.

cc @niklasad1 @sigurpol for review and a 👍 that this will be incorporated in the miner.


Reflection

In the past months, I have been working on two pallets:

  1. pallet-staking-async, a lot of code for which I have written in 2019/2020, with little experience in FRAME.
  2. pallet-election-provider-multi-block, which I started writing in 2021, and resumed in 2025 (yes, 4 years later :D). But crucially, I can acknowledge that by this time I had a much better understanding of how to write FRAME pallets in a safe and extensible way.

And nowadays, I can feel the difference very clearly. Adding anything in pallet-staking-async seems unsafe, a million test cases break, but they are mostly false negatives, leading to uncertainty and more likelihood of missing the few that are actually false-positives. The code has patchy structure, and has a lot of assumptions that are, perhaps written down somewhere but are not particularly enforced.

Contrary, adding a new feature to pallet-election-provider-multi-block, like this PR does, feels like a smooth ride. I know the code has many checks in place, it will prevent me from making mistakes, the APIs are much clearer, and in general I have very little doubt of something breaking.

And I think this boils down to two practices that I deployed in the latter, but didn't have the expertise to do in the former:

  1. Composite storage items: Almost any complex FRAME pallet has a number of storage items that are related, and it helps A TON if you put them behind a type the does the reads and writes together, and asserts all invariants on the spot via a fn mutate_checked and/or asserting it in a fn try_state that is executed in all unit tests.
  2. I spent almost as much time and energy in developing the mock.rs as I did in developing the core pallet logic. Having a great test setup will definitely help. I often see developers giving very little attention to their mock.rs, it being a copy and paste of the neighboring pallet, and I find this to be counter-productive: It saves you time today, but it will have indefinite cumulative burden on you and others in the future.

I wrote more about all of these in a very old forum post: https://forum.polkadot.network/t/testing-complex-frame-pallets-discussion-tools/356#composite-semi-private-storage-types-3

@kianenigma kianenigma requested a review from a team as a code owner April 23, 2025 09:36
@kianenigma kianenigma changed the title Prepare election-provider-multi-block for full lazy data deletion [AHM] Prepare election-provider-multi-block for full lazy data deletion Apr 23, 2025
@kianenigma kianenigma added the R0-no-crate-publish-required The change does not require any crates to be re-published. label Apr 23, 2025
@sigurpol
Copy link
Copy Markdown
Contributor

sigurpol commented Apr 24, 2025

I will let @niklasad1 to proper comment on the actual changes 😅
If I get the main impact for the miner, I am fine in adapting the miner to handle key storage items by round index.

I am wondering what the impact for the clearing submission strategy is and if we still need to reclaim deposit explicitly with this new approach or if it is something that the chain can do for the miner instead.

Is the one below a fair summary for miner's Point of View?

Current behavior:

The staking miner needs to be careful about previous submissions potentially interfering with new ones during round transitions. When a new election round starts, the miner might need to explicitly clear or withdraw previous solutions (see paritytech/polkadot-staking-miner#1029).

New behavior with round-indexed storage:

  • Automatic round separation: Since all storage items are now keyed by round index, the chain automatically separates solutions from different rounds

  • No manual clearing needed: The miner no longer needs to explicitly clear previous submissions - they're automatically isolated in their own round-specific storage so potentially could give the deposit back without manual intervention of the miner. [I am really not sure if I get this point right!!!!]

  • Race condition prevention: Even if the miner is submitting a solution for round N while round N+1 starts, there's no conflict because they're stored separately

Or do I get it wrong, @kianenigma ?

@niklasad1
Copy link
Copy Markdown
Contributor

niklasad1 commented Apr 24, 2025

This shouldn't be an issue for the staking-miner rather than that we have to update the metadata and it only affects that multi-block verifier miner doesn't have to clean up old data?

Maybe worse db migrations etc

You got my 👍

@kianenigma
Copy link
Copy Markdown
Contributor Author

I am wondering what the impact for the clearing submission strategy is and if we still need to reclaim deposit explicitly with this new approach or if it is something that the chain can do for the miner instead.

The signed phase hasn't really changed in this PR -- the deletion of data was lazy, and it will remain as it was before.

No manual clearing needed: The miner no longer needs to explicitly clear previous submissions - they're automatically isolated in their own round-specific storage so potentially could give the deposit back without manual intervention of the miner. [I am really not sure if I get this point right!!!!]

So yea, this is not a correct statement.

The old signed submissions are already key-ed by round, and they can be deleted later. Also note that you don't have to do it asap, you will just get your deposit back when you clear it.

@sigurpol
Copy link
Copy Markdown
Contributor

I am wondering what the impact for the clearing submission strategy is and if we still need to reclaim deposit explicitly with this new approach or if it is something that the chain can do for the miner instead.

The signed phase hasn't really changed in this PR -- the deletion of data was lazy, and it will remain as it was before.

No manual clearing needed: The miner no longer needs to explicitly clear previous submissions - they're automatically isolated in their own round-specific storage so potentially could give the deposit back without manual intervention of the miner. [I am really not sure if I get this point right!!!!]

So yea, this is not a correct statement.

The old signed submissions are already key-ed by round, and they can be deleted later. Also note that you don't have to do it asap, you will just get your deposit back when you clear it.

Great, thank you, makes sense!

@sigurpol
Copy link
Copy Markdown
Contributor

No manual clearing needed: The miner no longer needs to explicitly clear previous submissions - they're automatically isolated in their own round-specific storage so potentially could give the deposit back without manual intervention of the miner. [I am really not sure if I get this point right!!!!]

So yea, this is not a correct statement.

The old signed submissions are already key-ed by round, and they can be deleted later. Also note that you don't have to do it asap, you will just get your deposit back when you clear it.

@kianenigma , got it. Not for now but in a distant future, what does prevent us to move the responsibility to clear / reclaim the deposit from the miner to the chain / pallet ? In theory when you do the lazy deletion and everything, you will have all the information about discarded solutions or not?

@kianenigma
Copy link
Copy Markdown
Contributor Author

miner to the chain / pallet

Yeah we could, but the problem is that when the pallet logic force-deletes everything all at once, you end up with a lot of storage operations in a single block. The blockchain programming model is kinda uncomfortable here, but we have some tools for it:

  1. on_idle hook
  2. pallet::tasks system
  3. and the most un-opinionated one, the chain just does nothing, but has some transactions that do the cleanup, and if done correctly, they are free

@kianenigma kianenigma added this pull request to the merge queue Apr 25, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 25, 2025
@kianenigma kianenigma enabled auto-merge April 25, 2025 12:07
@kianenigma kianenigma added this pull request to the merge queue Apr 25, 2025
Merged via the queue into master with commit 98c3154 Apr 25, 2025
239 of 249 checks passed
@kianenigma kianenigma deleted the kiz-empb-lazy-deletion-1 branch April 25, 2025 14:07
sigurpol added a commit to paritytech/polkadot-staking-miner that referenced this pull request Apr 25, 2025
`desired_targets` storage item now requires the round index
as a parameter, so this change correctly aligns our code with
the updated storage API in the Polkadot SDK.

See [SDK PR](paritytech/polkadot-sdk#8304).
niklasad1 pushed a commit to paritytech/polkadot-staking-miner that referenced this pull request Apr 25, 2025
`desired_targets` storage item now requires the round index
as a parameter, so this change correctly aligns our code with
the updated storage API in the Polkadot SDK.

See [SDK PR](paritytech/polkadot-sdk#8304).
sigurpol added a commit to paritytech/polkadot-staking-miner that referenced this pull request Apr 25, 2025
* Update multiblock metadata (tracking SDK rev 8da42e8fae)

`desired_targets` storage item now requires the round index
as a parameter, so this change correctly aligns our code with
the updated storage API in the Polkadot SDK.

See [SDK PR](paritytech/polkadot-sdk#8304).

* Handle round parameter in snapshot functions

[SDK #8304](https://github.com/paritytech/polkadot-sdk/pull/8304/files)
has introduced a breaking change for which also snapshot methods are
keyed on round index.

Update the code accordingly.

* Fix fmt
wassimans pushed a commit to wassimans/polkadot-sdk that referenced this pull request Apr 27, 2025
…tion (paritytech#8304)

Follow-up to: paritytech#8127

This PR makes all the hefty storage items in `multi-block` and
`multi-block::verifier` to be keyed by a round index (note that the same
has already been done for `multi-block::signed`).

This allows us to stop deleting all the data upon a round ending, and
instead simply know that it is stale, and just delete it later via
`#[pallet::task]`, or a free extrinsic.

As per paritytech#8127, the worst
PoV weight of this pallet with the proposed configurations is around 4Mb
uncompressed. With 10Mb PoV, this will likely be okay for now.
Therefore, this PR is adding the infrastructure for lazy deletion, yet
it is not activating it.

I am raising this in advance because 

1. It will prevent a data migration in the past. Migrating a Map to
`DoubleMap` and so on in transit is a pain
2. The `polkadot-staking-miner` better adapt itself already to read
these storage items in the new format

A follow-up may replace `OnRoundRotation` with `()`, and add the
transactions/`#[pallet::task]` needed to do the deletion lazily.

cc @niklasad1 @sigurpol for review and a 👍 that this will be
incorporated in the miner.


---

## Reflection

In the past months, I have been working on two pallets: 

1. `pallet-staking-async`, a lot of code for which I have written in
2019/2020, with little experience in FRAME.
2. `pallet-election-provider-multi-block`, which I started writing in
2021, and resumed in 2025 (yes, 4 years later :D). But crucially, I can
acknowledge that by this time I had a much better understanding of how
to write FRAME pallets in a safe and extensible way.

And nowadays, I can feel the difference very clearly. Adding anything in
`pallet-staking-async` seems unsafe, a million test cases break, but
they are mostly false negatives, leading to uncertainty and more
likelihood of missing the few that are actually false-positives. The
code has patchy structure, and has a lot of assumptions that are,
perhaps written down _somewhere_ but are not particularly enforced.

Contrary, adding a new feature to
`pallet-election-provider-multi-block`, like this PR does, feels like a
smooth ride. I know the code has many checks in place, it will prevent
me from making mistakes, the APIs are much clearer, and in general I
have very little doubt of something breaking.

And I think this boils down to **two practices** that I deployed in the
latter, but didn't have the expertise to do in the former:

1. **Composite storage items**: Almost any complex FRAME pallet has a
number of storage items that are related, and it helps A TON if you put
them behind a type the does the reads and writes together, and asserts
all invariants on the spot via a `fn mutate_checked` and/or asserting it
in a `fn try_state` that is executed in all unit tests.
2. I spent almost as much time and energy in developing the `mock.rs` as
I did in developing the core pallet logic. Having a great test setup
will definitely help. I often see developers giving very little
attention to their `mock.rs`, it being a copy and paste of the
neighboring pallet, and I find this to be counter-productive: It saves
you time today, but it will have indefinite cumulative burden on you and
others in the future.

I wrote more about all of these in a very old forum post:
https://forum.polkadot.network/t/testing-complex-frame-pallets-discussion-tools/356#composite-semi-private-storage-types-3
ordian added a commit that referenced this pull request Apr 28, 2025
* master: (120 commits)
  [CI] Improve GH build status checking (#8331)
  [CI/CD] Use original PR name in prdoc check for the backport PR's to the stable branches (#8329)
  Add new host APIs set_storage_or_clear and get_storage_or_zero (#7857)
  push to dockerhub (#8322)
  Snowbridge - V1 - Adds 2 hop transfer to Rococo (#7956)
  [AHM] Prepare `election-provider-multi-block` for full lazy data deletion (#8304)
  Check umbrella version (#8250)
  [AHM] Fully bound staking async (#8303)
  migrate parachain-templates tests to `gha` (#8226)
  staking-async: add missing new_session_genesis (#8310)
  New NFT traits: granular and abstract interface (#5620)
  Extract create_pool_with_native_on macro to common crate (#8289)
  XCMP: use batching when enqueuing inbound messages (#8021)
  Snowbridge - Tests refactor (#8014)
  Allow configuration of worst case buy execution weight (#7944)
  Fix faulty pre-upgrade migration check in pallet-session (#8294)
  [pallet-revive] add get_storage_var_key for variable-sized keys (#8274)
  add poke_deposit extrinsic to pallet-recovery (#7882)
  `txpool`: use tracing for structured logging (#8001)
  [revive] eth-rpc refactoring (#8148)
  ...
castillax pushed a commit that referenced this pull request May 12, 2025
…tion (#8304)

Follow-up to: #8127

This PR makes all the hefty storage items in `multi-block` and
`multi-block::verifier` to be keyed by a round index (note that the same
has already been done for `multi-block::signed`).

This allows us to stop deleting all the data upon a round ending, and
instead simply know that it is stale, and just delete it later via
`#[pallet::task]`, or a free extrinsic.

As per #8127, the worst
PoV weight of this pallet with the proposed configurations is around 4Mb
uncompressed. With 10Mb PoV, this will likely be okay for now.
Therefore, this PR is adding the infrastructure for lazy deletion, yet
it is not activating it.

I am raising this in advance because 

1. It will prevent a data migration in the past. Migrating a Map to
`DoubleMap` and so on in transit is a pain
2. The `polkadot-staking-miner` better adapt itself already to read
these storage items in the new format

A follow-up may replace `OnRoundRotation` with `()`, and add the
transactions/`#[pallet::task]` needed to do the deletion lazily.

cc @niklasad1 @sigurpol for review and a 👍 that this will be
incorporated in the miner.


---

## Reflection

In the past months, I have been working on two pallets: 

1. `pallet-staking-async`, a lot of code for which I have written in
2019/2020, with little experience in FRAME.
2. `pallet-election-provider-multi-block`, which I started writing in
2021, and resumed in 2025 (yes, 4 years later :D). But crucially, I can
acknowledge that by this time I had a much better understanding of how
to write FRAME pallets in a safe and extensible way.

And nowadays, I can feel the difference very clearly. Adding anything in
`pallet-staking-async` seems unsafe, a million test cases break, but
they are mostly false negatives, leading to uncertainty and more
likelihood of missing the few that are actually false-positives. The
code has patchy structure, and has a lot of assumptions that are,
perhaps written down _somewhere_ but are not particularly enforced.

Contrary, adding a new feature to
`pallet-election-provider-multi-block`, like this PR does, feels like a
smooth ride. I know the code has many checks in place, it will prevent
me from making mistakes, the APIs are much clearer, and in general I
have very little doubt of something breaking.

And I think this boils down to **two practices** that I deployed in the
latter, but didn't have the expertise to do in the former:

1. **Composite storage items**: Almost any complex FRAME pallet has a
number of storage items that are related, and it helps A TON if you put
them behind a type the does the reads and writes together, and asserts
all invariants on the spot via a `fn mutate_checked` and/or asserting it
in a `fn try_state` that is executed in all unit tests.
2. I spent almost as much time and energy in developing the `mock.rs` as
I did in developing the core pallet logic. Having a great test setup
will definitely help. I often see developers giving very little
attention to their `mock.rs`, it being a copy and paste of the
neighboring pallet, and I find this to be counter-productive: It saves
you time today, but it will have indefinite cumulative burden on you and
others in the future.

I wrote more about all of these in a very old forum post:
https://forum.polkadot.network/t/testing-complex-frame-pallets-discussion-tools/356#composite-semi-private-storage-types-3
sigurpol added a commit to paritytech/polkadot-staking-miner that referenced this pull request May 12, 2025
* Update multiblock metadata (tracking SDK rev 8da42e8fae)

`desired_targets` storage item now requires the round index
as a parameter, so this change correctly aligns our code with
the updated storage API in the Polkadot SDK.

See [SDK PR](paritytech/polkadot-sdk#8304).

* Handle round parameter in snapshot functions

[SDK #8304](https://github.com/paritytech/polkadot-sdk/pull/8304/files)
has introduced a breaking change for which also snapshot methods are
keyed on round index.

Update the code accordingly.

* Fix fmt
sigurpol added a commit to paritytech/polkadot-staking-miner that referenced this pull request May 13, 2025
* tests: add e2e tests for multi-block miner

* tweak config files for gha

* fix fmt

* Fix snaphot fn after metadata update (#1037)

* Update multiblock metadata (tracking SDK rev 8da42e8fae)

`desired_targets` storage item now requires the round index
as a parameter, so this change correctly aligns our code with
the updated storage API in the Polkadot SDK.

See [SDK PR](paritytech/polkadot-sdk#8304).

* Handle round parameter in snapshot functions

[SDK #8304](https://github.com/paritytech/polkadot-sdk/pull/8304/files)
has introduced a breaking change for which also snapshot methods are
keyed on round index.

Update the code accordingly.

* Fix fmt

* chore(deps): bump actions/download-artifact from 4.2.1 to 4.3.0 (#1039)

Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 4.2.1 to 4.3.0.
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](actions/download-artifact@v4.2.1...v4.3.0)

---
updated-dependencies:
- dependency-name: actions/download-artifact
  dependency-version: 4.3.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Do not build omni-node

* Check RPC port for the parachain collator before running the miner

Do not rely on specific text in polkadot logs but on the parachain
collator to be up and running on the RPC port specified on the
zombienet configuration.

* Fix wrong page index while checking page submissions

* Test successful <=> users is rewarded after score and all pages are submitted

* Cleanup

* Improve error handling

* Add a TODO around proper zombienet cleanup

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Paolo La Camera <paolo@parity.io>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

R0-no-crate-publish-required The change does not require any crates to be re-published.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants